No hal - 00326741 October 2008 Performance analysis of methods to infer missing genotypes Christine Sinoquet

نویسنده

  • Christine Sinoquet
چکیده

Complex analyses such as genetic mapping, disease association studies, disease mapping in the context of environmental health and environmental epidemiology studies rely on high-throughput genotyping techniques. These analyses thoroughly examine genetic variations between subjects, in particular through Single Nucleotide Polymorphism (SNP). Nonetheless, though nowadays genotyping techniques impose high-quality standards, one still has to cope with the issues of missing data and genotyping errors. Typically, the percentage of missing data or missing calls now ranges in interval [5%, 10%]. Computational inference of missing data represents a challenging alternative to genotyping again the missing regions. This document first briefly reviews the various methods designed to infer missing SNPs. Then, it reports performances published for these inference methods. The present report carefully describes the characteristics of the different benchmarks generated by the designers (missing data percentage, correlation between SNPs). We show that most methods provide accuracies in the range [90%, 96%]. However, we also emphasize that no algorithm garantees constant high accuracies: an algorithm may perform well on some benchmarks and show in contrast relatively poor results on others. in ria -0 03 26 74 1, v er si on 2 5 O ct 2 00 8

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Improvement of missing genotype imputation through bi - directional parsing of large SNP panels Christine Sinoquet

Such difficult analyses as disease association studies, which aim at mappping genetic variants underlying complex human diseases, rely on high-throughput genotyping techniques. However, a shortcoming of these techniques is the generation of missing calls. Computational inference of missing data represents a challenging alternative to genotyping again the missing regions. In this paper, we prese...

متن کامل

An asymptotic test for Quantitative Trait Locus detection in presence of missing genotypes

HAL is a multi-disciplinary open access archive for the deposit and dissemination of scientific research documents, whether they are published or not. The documents may come from teaching and research institutions in France or abroad, or from public or private research centers. L’archive ouverte pluridisciplinaire HAL, est destinée au dépôt et à la diffusion de documents scientifiques de niveau...

متن کامل

Genetic Diversity Analysis of Highly Incomplete SNP Genotype Data with Imputations: An Empirical Assessment

Genotyping by sequencing (GBS) recently has emerged as a promising genomic approach for assessing genetic diversity on a genome-wide scale. However, concerns are not lacking about the uniquely large unbalance in GBS genotype data. Although some genotype imputation has been proposed to infer missing observations, little is known about the reliability of a genetic diversity analysis of GBS data, ...

متن کامل

Maximal sub-triangulation in pre-processing phylogenetic data

In order to help infer an evolutionary tree (phylogeny) from experimental data, we propose a new method for pre-processing the corresponding dissimilarity matrix, which is related to the property that the distance matrix of a phylogeny (called an additive matrix) describes a sandwich family of chordal graphs. As experimental data often yield distance values which are known to be under-estimated...

متن کامل

Genetic diversity analysis of highly incomplete SNP genotype data with 6 imputations : an empirical assessment

17 Genotyping by sequencing (GBS) has recently emerged as a promising genomic approach for 18 assessing genetic diversity on a genome-wide scale. However, concerns are not lacking about the 19 uniquely large unbalance in GBS genotype data. While some genotype imputation has been 20 proposed to infer missing observations, little is known about the reliability of a genetic diversity 21 analysis o...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008